AITopics | bounded support

Global Convergence in Training Large-Scale Transformers

Neural Information Processing SystemsFeb-10-2026, 22:47:36 GMT

Despite the widespread success of Transformers across various domains, their optimization guarantees in large-scale model settings are not well-understood.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Asia > China > Hong Kong (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Non-AsymptoticErrorBoundsfor BidirectionalGANs

Neural Information Processing SystemsFeb-9-2026, 02:58:51 GMT

We derive nearly sharp bounds for the bidirectional GAN (BiGAN) estimation error under the Dudley distance between the latent joint distribution and the data joint distribution with appropriately specified architecture of the neural networks usedinthemodel.

artificial intelligence, arxivpreprintarxiv, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Macao (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Hong Kong (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Non-asymptotic Error Bounds for Bidirectional GANs

Neural Information Processing SystemsDec-24-2025, 05:37:53 GMT

We derive nearly sharp bounds for the bidirectional GAN (BiGAN) estimation error under the Dudley distance between the latent joint distribution and the data joint distribution with appropriately specified architecture of the neural networks used in the model. To the best of our knowledge, this is the first theoretical guarantee for the bidirectional GAN learning approach. An appealing feature of our results is that they do not assume the reference and the data distributions to have the same dimensions or these distributions to have bounded support. These assumptions are commonly assumed in the existing convergence analysis of the unidirectional GANs but may not be satisfied in practice. Our results are also applicable to the Wasserstein bidirectional GAN if the target distribution is assumed to have a bounded support. To prove these results, we construct neural network functions that push forward an empirical distribution to another arbitrary empirical distribution on a possibly different-dimensional space. We also develop a novel decomposition of the integral probability metric for the error analysis of bidirectional GANs. These basic theoretical results are of independent interest and can be applied to other related learning problems.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Global Convergence in Training Large-Scale Transformers

Neural Information Processing SystemsOct-9-2025, 22:54:00 GMT

Despite the widespread success of Transformers across various domains, their optimization guarantees in large-scale model settings are not well-understood.

assumption, assumption 2, exp, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Asia > China > Hong Kong (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Concentration of risk measures: A Wasserstein distance approach

Sanjay P. Bhat, Prashanth L.A.

Neural Information Processing SystemsOct-2-2025, 00:47:04 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, risk measure, (18 more...)

Neural Information Processing Systems

Country: Asia > India (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.89)

Add feedback

66be31e4c40d676991f2405aaecc6934-Paper.pdf

Neural Information Processing SystemsAug-14-2025, 22:11:48 GMT

bidirectional gan, gan, target distribution, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Iowa > Johnson County > Iowa City (0.14)
Asia > China > Hong Kong (0.05)
Asia > China > Hubei Province > Wuhan (0.04)
Asia > Macao (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Global Convergence in Training Large-Scale Transformers

Gao, Cheng, Cao, Yuan, Li, Zihao, He, Yihan, Wang, Mengdi, Liu, Han, Klusowski, Jason Matthew, Fan, Jianqing

arXiv.org Machine LearningOct-30-2024

Despite the widespread success of Transformers across various domains, their optimization guarantees in large-scale model settings are not well-understood. This paper rigorously analyzes the convergence properties of gradient flow in training Transformers with weight decay regularization. First, we construct the mean-field limit of large-scale Transformers, showing that as the model width and depth go to infinity, gradient flow converges to the Wasserstein gradient flow, which is represented by a partial differential equation. Then, we demonstrate that the gradient flow reaches a global minimum consistent with the PDE solution when the weight decay regularization parameter is sufficiently small. Our analysis is based on a series of novel mean-field techniques that adapt to Transformers. Compared with existing tools for deep networks (Lu et al., 2020) that demand homogeneity and global Lipschitz smoothness, we utilize a refined analysis assuming only $\textit{partial homogeneity}$ and $\textit{local Lipschitz smoothness}$. These new techniques may be of independent interest.

assumption, assumption 2, exp, (16 more...)

arXiv.org Machine Learning

2410.2361

Country:

North America > United States (0.45)
Asia > China > Hong Kong (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Non-asymptotic Error Bounds for Bidirectional GANs

Neural Information Processing SystemsOct-10-2024, 21:10:20 GMT

We derive nearly sharp bounds for the bidirectional GAN (BiGAN) estimation error under the Dudley distance between the latent joint distribution and the data joint distribution with appropriately specified architecture of the neural networks used in the model. To the best of our knowledge, this is the first theoretical guarantee for the bidirectional GAN learning approach. An appealing feature of our results is that they do not assume the reference and the data distributions to have the same dimensions or these distributions to have bounded support. These assumptions are commonly assumed in the existing convergence analysis of the unidirectional GANs but may not be satisfied in practice. Our results are also applicable to the Wasserstein bidirectional GAN if the target distribution is assumed to have a bounded support.

bidirectional gan, empirical distribution, non-asymptotic error bound, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.33)

Add feedback

Privacy Amplification for the Gaussian Mechanism via Bounded Support

Hu, Shengyuan, Mahloujifar, Saeed, Smith, Virginia, Chaudhuri, Kamalika, Guo, Chuan

arXiv.org Artificial IntelligenceMar-7-2024

Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset. These guarantees can be desirable compared to vanilla DP in real world settings as they tightly upper-bound the privacy leakage for a $\textit{specific}$ individual in an $\textit{actual}$ dataset, rather than considering worst-case datasets. While these frameworks are beginning to gain popularity, to date, there is a lack of private mechanisms that can fully leverage advantages of data-dependent accounting. To bridge this gap, we propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting. Experiments on model training with DP-SGD show that using bounded support Gaussian mechanisms can provide a reduction of the pDP bound $\epsilon$ by as much as 30% without negative effects on model utility.

bounded support, gaussian mechanism, mechanism, (14 more...)

arXiv.org Artificial Intelligence

2403.05598

Country:

North America > United States > Virginia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Non-asymptotic Convergence of Discrete-time Diffusion Models: New Approach and Improved Rate

Liang, Yuchen, Ju, Peizhong, Liang, Yingbin, Shroff, Ness

arXiv.org Machine LearningFeb-21-2024

The denoising diffusion model emerges recently as a powerful generative technique that converts noise into data. Theoretical convergence guarantee has been mainly studied for continuous-time diffusion models, and has been obtained for discrete-time diffusion models only for distributions with bounded support in the literature. In this paper, we establish the convergence guarantee for substantially larger classes of distributions under discrete-time diffusion models and further improve the convergence rate for distributions with bounded support. In particular, we first establish the convergence rates for both smooth and general (possibly non-smooth) distributions having finite second moment. We then specialize our results to a number of interesting classes of distributions with explicit parameter dependencies, including distributions with Lipschitz scores, Gaussian mixture distributions, and distributions with bounded support. We further propose a novel accelerated sampler and show that it improves the convergence rates of the corresponding regular sampler by orders of magnitude with respect to all system parameters. For distributions with bounded support, our result improves the dimensional dependence of the previous convergence rate by orders of magnitude. Our study features a novel analysis technique that constructs tilting factor representation of the convergence error and exploits Tweedie's formula for handling Taylor expansion power terms.

assumption 4, bounded support, dq 0, (16 more...)

arXiv.org Machine Learning

2402.13901

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.13)
North America > United States > Ohio (0.04)

Genre: Research Report (1.00)

Industry: Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.45)

Add feedback

Filters

Collaborating Authors

bounded support

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Global Convergence in Training Large-Scale Transformers

Non-AsymptoticErrorBoundsfor BidirectionalGANs

Non-asymptotic Error Bounds for Bidirectional GANs

Global Convergence in Training Large-Scale Transformers

Concentration of risk measures: A Wasserstein distance approach

66be31e4c40d676991f2405aaecc6934-Paper.pdf

Global Convergence in Training Large-Scale Transformers

Non-asymptotic Error Bounds for Bidirectional GANs

Privacy Amplification for the Gaussian Mechanism via Bounded Support

Non-asymptotic Convergence of Discrete-time Diffusion Models: New Approach and Improved Rate